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Abstract- The development of Internet causes an 
eruptive expansion of digital images, and also gives 
people more ways to get those images. Because the 
dissemination of video and image data in digital form 
has grown, Content Based Image Retrieval (CBIR) 
has become an eminent research topic. The 
importance of an effective technique in searching and 
retrieving images from the huge collection cannot be 
overemphasized. Therefore an important problem that 
needs to be addressed is fast retrieval of images from 
large databases. To perceive images that are 
perceptually similar to a query image, image retrieval 
systems attempt to search through a database. 
Content-based image retrieval utilizes representations 
of features that are automatically extracted from the 
images themselves. 
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I. INTRODUCTION 

Images were traditionally superintended by first 
elucidating their contents and then exerting text- 
retrieval techniques to index them. However, with the 
swell of information in digital image format some 
downsides of this technique were ejaculated: 

1. Manual elucidation requires enormous amount of 
labor 

2. Different people may perceive differently the 
contents of an image; thus no objective keywords 
for search are defined 

A new research field was bom in the 90’s: Content- 
based Image Retrieval aims at indexing and retrieving 
images based on their visual contents. 

Also known as Query By Image Content (QBIC), 
presents the technologies allowing to organize digital 
pictures by their visual features. They are based on the 
application of computer vision techniques to 


the image retrieval problem in large 
databases. Content-Based Image Retrieval (CBIR) 
consists of retrieving the most visually similar images 
to a given query image from a database of images. 

The shortcomings of these systems are due both to the 
image representations they use and to their methods of 
accessing those representations to find images. The 
problems of image retrieval are becoming widely 
recognized, and the search for solutions an 
increasingly active area for research and development. 

In content-based visual retrieval, there are two 
fundamental challenges: 

Intention gap: The intention gap refers to the 
difficulty that a user suffers to precisely express the 
expected visual content by a query at hand, such as an 
example image or a sketch map. The expectation of 
users for huge amount of objects to search among. 
[14] 

Semantic gap: The issue related to the Semantic gap 
where it means the lack of coincidence between the 
information that one can extract from the visual data 
and the interpretation that the same data have for a 
user in a given situation. The user wants to seek 
semantic similarity, but the database can only provide 
similarity by data processing. [15] The semantic gap 
originates from the difficulty in describing high-level 
semantic concept with low-level visual feature [1] [2] 
[3], 

A. Open issues: 

1. Gap between low level features and high-level 
semantics 

2. Human in the loop - interactive systems 

3. Retrieval speed - most research prototypes can 
handle only a few thousand images. 

4. A reliable test-bed and measurement criterion. 
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II. LITERATURE REVIEW 

Most of the search engines (ex.google, yahoo, etc.,) 
are based on a semantic search, i.e., the user types in a 
series of keywords and the images are also annotated 
using keywords. Thus the match is done primarily 
through these keywords. In the recent years CBIR 
system have been developed to handle the large image 
database effectively. Basically color, texture and 
shape have been used for extracting similar images 
from an image database. Different CBIR techniques 
have adopted different techniques. 

Kamlesh Kumar (2016) [4] et.Al proposes CBIR 
method utilising Gray Scale Weighted usual system 
for reducing the characteristic vector dimension. The 
proposed procedure is more suitable for color and 
texture image feature analysis as in comparison with 
color weighted natural system as illustrated in 
literature review. To show the effectiveness of 
retrieval approach, two common benchmark dataset 
particularly, Wang and Amsterdam Library of Texture 
Images (A LOT) for color and texture had been 
chosen to evaluate the procedure retrieval accuracies 
as good as efficiencies generated by each method. 

Ekta Gupta (2015) [5] et.al presents the CBIR, using 
facets like colour and texture, called WBCHIR 
(Wavelet Based color Histogram Image 
Retrieval).The shape and shade elements are extracted 
within the direction of wavelet transformation and 
colour histogram and the association of these 
elements is lively to scaling and conversion of objects 
in image. 

Kavita Chauhan (2015) [6] et.al development of 
digital images requires enhanced and proficient 
techniques for sorting, browsing and seeking 
operations through ever-growing image databases. 
CBIR systems are search engines for image databases, 
which perform indexing on images according to their 
content and features. This paper presents the 
systematic review of various existing CBIR systems 
and their feature extraction techniques. Further the 
performance analysis and limitations of these systems 
have been discussed. 

Kannan in 2010 [7] In this paper author describes 
that image mining is the main concept which can 
extract potential information from the collection. For 
color based image extraction RGB model is used, 
RGB component taken from each and every image. 
Images are stored by mean values of Red, Green, blue 
components of target images. The top ranked images 
are further regrouped according to texture features. 


The gray level co-occurrence matrix (GLCM) used 
texture calculations (contrast, dissimilarity, 
homogeneity).The images are classified into clusters 
with the help of GLCM based on Low texture, 
average texture and high texture. Texture based 
classification is simply easy and efficient for real time 
applications as compared to Entropy method. The 
authors also evaluate the performance with the help of 
precision v/s recall graph. Recall value 1 just by 
retrieving all images and precision value kept in a 
higher value by retrieving only few images. 

Silakari in 2009 [8] In this paper a framework of 
unsupervised clustering of images based on the color 
feature of image. Clustering of images based on color 
moment and Block Truncation Coding to extract 
features from an image database is proposed. K- 
means clustering algorithm is conducted to group the 
dataset in various clusters. 

Amanbir Sandhu, Aarti Kochhar in 2012 [9] 

Presents a technique for content based image retrieval 
using texture, color and shape for image analysis. In 
this paper they worked with the three features i.e. 
texture, color and shape and its different 
combinations. The GLCM is used for texture feature 
extraction, histogram for Color feature extraction and 
for shape different factors are found like area, Euler 
No., eccentricity and Filled Area. 

Choras et al. [10] proposed an integrated color, 
texture and shape feature extraction method in which 
Gabor filtration is used for determining the number of 
regions of interest (ROIs). They calculated texture and 
color features from the ROIs based on threshold 
Gabor features and histograms, color moments in 
YUV space, and shape features based on Zernike 
moments. The features presented proved to be 
efficient in determining similarity between images. 

Maheshwari et al. [11] have used Color moment and 
Gabor filter to extract features for image dataset. In 
their study, K-means and hierarchical clustering 
algorithm is applied to group the image dataset into 
various clusters. 

In [12] a novel CBIR system based on three methods: 
Initially co-occurrence matrix (CCM) is computed for 
color feature. The CCM matrix is used to analyze 
probability of occurrence of pixels having same color 
and the adjacent pixels in the image. Second the 
difference between pixels of scan patterns (BDPSP) is 
computed to find out the variance among all pixels of 
scan patterns. The third method is color distribution 
for K-mean algorithm. It is based on color histogram 
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in which each color pixel is substituted by any color 
that is utmost related to the existing color. The K- 
mean algorithm divides all the pixels into k clusters. 
A CBIR system based on three algorithms viz. feature 
extraction, image mining and the rule based is 
proposed in [13]. The first algorithm globally 
extracted the color and texture features from the 
image. It is considered that these features are 
invariable to the image transform and could be used 
for the detection of the objects. The second algorithm 
uses the image mining method that implies clustering 
algorithm to retrieve hidden knowledge from the 
image. The third algorithm uses the rules based on 
relevance feedback to filter the results and to improve 
the clusters. 

III. OVERVIEW OF CBIR SYSTEM 

Content-based image retrieval uses the visual contents 
of an image such as texture, color, shape, and spatial 
layout to represent and index the image. In typical 
CBIR systems, the visual content of the images in the 
database are extracted and described by multi¬ 
dimensional feature vectors. The feature vector of the 
images in the database forms a feature database. To 
retrieve the images, users provide the retrieval system 
with example images. The system then changes these 
examples into its internal representation of feature 
vectors. 



Figure 1: Overview of CBIR System 

The diagram above describes the overview of the 
Content Based Image Retrieval (CBIR) system. Each 
block of the figure describes a particular process in 
the system. As shown in the figure all the feature 
vectors of the images are stored in the databases 
called as the feature database, the corresponding 
feature vectors of query image is extracted and it is 
compared with all the feature vectors stored in the 
database using a suitable similarity measurement 
technique and relevant images are retrieved on the 
basis of predefined threshold. 

A. Components of CBIR System 


The CBIR system consists of the following 
components: 

a. Query image 

It is the image to be found in the image database, 
whether the similar image is present or not. And how 
many are similar kind images are exist or not. 

b. Image database 

It consists of n number of images depends on the user 
choice. 

c. Feature extraction 

It separates visual information from the image and 
saves them as features vectors in a features database. 
The feature extraction finds the image detail in the 
form of feature value (or a set of value called a feature 
vector) for each pixel. These feature vectors are used 
to compare the query image with the other images and 
retrieval. 

d. Image matching 

The information about each image is stored in its 
feature vector for computation process and these 
feature vectors are compared with the feature vectors 
of query image which helps in measuring the 
similarity. 

e. Resultant retrieved images 

It finds the previously maintained information to find 
the matched images from database. The output will be 
the similar images having same or closest features as 
that of the query image. 

IV. FEATURE VECTORS 

In pattern recognition and machine learning, a feature 
vector is an n-dimensional vector of numerical 
features that represent some object. Many algorithms 
in machine learning require a numerical 
representation of objects, since such representations 
facilitate processing and statistical analysis. When 
representing images, the feature values might 
correspond to the pixels of an image, when 
representing texts perhaps to term occurrence 
frequencies. 

Some important feature vectors of the image are as 
follows: 

1. Mean: Mean gives the overall distribution of the 
pixel’s gray level. 

2. Variance: The variance is a measure of 
dispersion. It tells us something about the scatter 
of scores (here pixels) around the mean. It is 
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3. 


4. 


5. 


defined as the mean squared deviation from the 
mean, and symbolized by a small sigma squared. 
Its formula is: 

Variance = a x2= £ (X -M) 2 /N 
Where X=Pixel value 
M=Mean of the Image 
N=Total number of pixel 

Standard Deviation: 

The standard deviation is the square root of the 
variance and is Symbolized by a small Greek 
sigma - a. Its formula is the square root of any of 
the formulae for the variance, e.g. x=£(x)2/N 

Entropy: 

Entropy is a statistical measure of randomness 
that can be used to characterize the texture of the 
input image. 

Texture: 

Texture is that innate property of all surfaces that 
describes visual patterns, each having properties 
of homogeneity. It contains important information 
about the structural arrangement of the surface, 
such as; clouds, leaves, bricks, fabric, etc. It also 
describes the relationship of the surface to the 
surrounding environment. In short, it is a feature 
that describes the distinctive physical composition 
of a surface. 

Texture properties include: Coarseness, Contrast, 
Directionality, Line-likeness, Regularity, and 
Roughness in the image. 

Texture is one of the most important defining 
features of an image. It is characterized by the 
spatial distribution of gray levels in a 
neighborhood. In order to capture the spatial 
dependence of gray-level values, which contribute 
to the perception of texture, a two-dimensional 
dependence texture analysis matrix is taken into 
consideration. This two-dimensional matrix is 
obtained by decoding the image file; jpeg, bmp, 
etc. 





al Jou Figure 2: Examples of Textures 

6. Color: 


One of the most important features that make 
possible the recognition of images by humans is 
color. Color is a property that depends on the 
reflection of light to the eye and the processing of 
that information in the brain. We use color every 
day to tell the difference between objects, places, 
and the time of day. Usually colors are defined in 
three dimensional color spaces. These could either 
be RGB (Red, Green, and Blue), HSV (Hue, 
Saturation, and Value) or HSB (Hue, Saturation, 
and Brightness). The last two are dependent on 
the human perception of hue, saturation, and 
brightness. 

7. Shape : 

Shape may be defined as the characteristic surface 
configuration of an object; an outline or contour. 
It permits an object to be distinguished from its 
surroundings by its outline. Shape representations 
can be generally divided into two categories: 

• Boundary-based, and 

• Region-based. 


(a) Clouds 
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Figure 3: Boundary-based & Region-based 

V. WORKING 

1. CBIR provides the retrieval of the digital images 
similar to the query image from the large storage 
of the database according to their content. 

2. When user gives the query image Gabor 
transform is applied over the image. 

3. Features (Mean, Entropy & Standard Deviation) 
of the transformed are calculated. 

4. Next the query image is split into R, G and B 
plane. 

5. Features (Mean, Entropy & Standard Deviation) 
of each plane are calculated again. 

6. Query image is then transformed with discrete 
wavelet Transform (DWT) 1-level. 

7. Again, the features of the each component of 
transformed image are calculated. 

8. Now the same procedure is applied on each 
image of the database. 

9. The Feature Vector of the database images (Fd) 
is obtained. 

10. The Feature Vector of the query image (Fq) is 
obtained. 

Similarity measurement using Euclidean distance 
is done between the feature vectors of database & 
query image. 

Images whose distance is more than the predefined 
threshold is retrieved from the image database. 

CONCLUSION 

CBIR is a fast developing technology with 
considerable potential. The dramatic progress by 
digital media at home, in enterprises, and on the web, 
has from above her last decade spawned great 
interests in developing ways for powerful indexing 
and searching of desired visual contents to open the 
worth of those contents. CBIR is the sub concern of 
CBR it is necessary to advance energy instruments to 
retrieving images from the web where the number and 
size of digital snap sort is developing fast. The area of 
content-based image retrieval is a hybrid research area 
that requires knowledge of both computer vision and 
of database systems. There are various applications of 
CBIR in every fields of life like blood cell detection, 


archeology, criminal investigation, image search, 
social networking sites, forensic Labs, and satellite 
etc. The field appears to be generating interesting and 
valid results, even though it has so far led to few 
commercial applications. 
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